Search results

1 – 2 of 2

View access options

Article

Publication date: 1 August 2016

Integration and optimization of multiple big data processing platforms

Bao-Rong Chang, Hsiu-Fen Tsai, Yun-Che Tsai, Chin-Fu Kuo and Chi-Chung Chen

The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big…

HTML

PDF (3.3 MB)

Downloads

501

Abstract

Purpose

The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big data environment.

Design/methodology/approach

First, the integration of Apache Hive, Cloudera Impala and BDAS Shark make the platform support SQL-like query. Next, users can access a single interface and select the best performance of big data warehouse platform automatically by the proposed optimizer. Finally, the distributed memory storage system Memcached incorporated into the distributed file system, Apache HDFS, is employed for fast caching query results. Therefore, if users query the same SQL command, the same result responds rapidly from the cache system instead of suffering the repeated searches in a big data warehouse and taking a longer time to retrieve.

Findings

As a result the proposed approach significantly improves the overall performance and dramatically reduces the search time as querying a database, especially applying for the high-repeatable SQL commands under multi-user mode.

Research limitations/implications

Currently, Shark’s latest stable version 0.9.1 does not support the latest versions of Spark and Hive. In addition, this series of software only supports Oracle JDK7. Using Oracle JDK8 or Open JDK will cause serious errors, and some software will be unable to run.

Practical implications

The problem with this system is that some blocks are missing when too many blocks are stored in one result (about 100,000 records). Another problem is that the sequential writing into In-memory cache wastes time.

Originality/value

When the remaining memory capacity is 2 GB or less on each server, Impala and Shark will have a lot of page swapping, causing extremely low performance. When the data scale is larger, it may cause the JVM I/O exception and make the program crash. However, when the remaining memory capacity is sufficient, Shark is faster than Hive and Impala. Impala’s consumption of memory resources is between those of Shark and Hive. This amount of remaining memory is sufficient for Impala’s maximum performance. In this study, each server allocates 20 GB of memory for cluster computing and sets the amount of remaining memory as Level 1: 3 percent (0.6 GB), Level 2: 15 percent (3 GB) and Level 3: 75 percent (15 GB) as the critical points. The program automatically selects Hive when memory is less than 15 percent, Impala at 15 to 75 percent and Shark at more than 75 percent.

Details

Engineering Computations, vol. 33 no. 6

Type: Research Article

DOI:

ISSN: 0264-4401

Keywords

View access options

Article

Publication date: 1 August 2016

Energy-efficient scheduling algorithm for real-time job set

Chin-Fu Kuo, Yung-Feng Lu and Bao-Rong Chang

The purpose of this paper is to investigate the scheduling problem of real-time jobs executing on a DVS processor. The jobs must complete their executions by their deadlines and…

HTML

PDF (1.1 MB)

Downloads

102

Abstract

Purpose

The purpose of this paper is to investigate the scheduling problem of real-time jobs executing on a DVS processor. The jobs must complete their executions by their deadlines and the energy consumption also must be minimized.

Design/methodology/approach

The two-phase energy-efficient scheduling algorithm is proposed to solve the scheduling problem for real-time jobs. In the off-line phase, the maximum instantaneous total density and instantaneous total density (ITD) are proposed to derive the speed of the processor for each time instance. The derived speeds are saved for run time. In the on-line phase, the authors set the processor speed according to the derived speeds and set a timer to expire at the corresponding end time instance of the used speed.

Findings

When the DVS processor executes a job at a proper speed, the energy consumption of the system can be minimized.

Research limitations/implications

This paper does not consider jobs with precedence constraints. It can be explored in the further work.

Practical implications

The experimental results of the proposed schemes are presented to show the effectiveness.

Originality/value

The experimental results show that the proposed scheduling algorithm, ITD, can achieve energy saving and make the processor fully utilized.

Details

Engineering Computations, vol. 33 no. 6

Type: Research Article

DOI:

ISSN: 0264-4401

Keywords

Access

Year

All dates (2)

Content type

Article (2)

1 – 2 of 2

Search results

Integration and optimization of multiple big data processing platforms

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Energy-efficient scheduling algorithm for real-time job set

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Access

Year

Content type

Something didn’t work…

All feedback is valuable

Platform update page

Questions & More Information

Integration and optimization of multiple big data processing platforms

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Energy-efficient scheduling algorithm for real-time job set

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Access

Year

Content type

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information